A Comparison of Algorithms for Learning with Nonconvex Regularization

نویسندگان

  • Quanming Yao
  • Yongqi Zhang
چکیده

Convex regularizers is popular for sparse and low-rank learning, mainly due to their nice statistical and optimization guarantee. However, they often lead to bias estimation, thus the sparsity and accuracy is not as good as desired. This motivates replacement of convex regularizers with nonconvex one, and recently many nonconvex regularizers have been proposed and indeed better performance than convex ones can be achieved. In this survey, we summary and compare algorithms which can be applied on learning with nonconvex regularizers. 1 General Non-Convex Optimization Consider general non-convex optimization problem 1 min x g(x) (1) Assumption 1 (Difference of Convex [1, 18, 14]). g(x) can be decomposed as difference of two convex (DoC) functions, i.e. g(x) = ĝ(x)− ḡ(x). We are more interested in non-smooth g(x), otherwise, (1) can be solve simply with gradient descent. With Assumption 1, (1) can be expressed as min x {g(x) ≡ ĝ(x)− ḡ(x)} where ĝ(x) and ḡ(x) are two convex functions. Definition 1 (Critical point [1]). Under Assumption 1, a point x∗ is a critical point of g(x), if it satisfies 0 ∈ ∂ĝ(x∗)− ∂ḡ(x∗). Remark. For smooth functions, critical point is defined as point where its gradient is zero. Difference of Convex programming (DCP) [1] is a general technique to optimize (1). Basically, it uses first order expansion to approximate ḡ(x), and generates {xt, yt} by xt+1 = arg min x ĝ(x)− 〈x− xt, yt〉 , yt ∈ ∂ḡ(xt) (2) Though g(x) is not convex, we can see the sub-problem in (2) becomes convex. Therefore, algorithms for convex optimization can be applied. However, the sub-problem may not be easy to solve, and this makes DCP slow in real world applications. This method is used in [4, 19] for sparse coding with non-convex regularizers. Theorem 1.1 (Convergence of DCP). Let Assumption 1 hold, then xt generated by (2) will converge to a critical point of (1). All functions mentioned in this paper are assumed bounded from below

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimal Computational and Statistical Rates of Convergence for Sparse Nonconvex Learning Problems.

We provide theoretical analysis of the statistical and computational properties of penalized M-estimators that can be formulated as the solution to a possibly nonconvex optimization problem. Many important estimators fall in this category, including least squares regression with nonconvex regularization, generalized linear models with nonconvex regularization and sparse elliptical random design...

متن کامل

Efficient Learning with a Family of Nonconvex Regularizers by Redistributing Nonconvexity

The use of convex regularizers allow for easy optimization, though they often produce biased estimation and inferior prediction performance. Recently, nonconvex regularizers have attracted a lot of attention and outperformed convex ones. However, the resultant optimization problem is much harder. In this paper, for a large class of nonconvex regularizers, we propose to move the nonconvexity fro...

متن کامل

Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion and Blind Deconvolution

Recent years have seen a flurry of activities in designing provably efficient nonconvex procedures for solving statistical estimation problems. Due to the highly nonconvex nature of the empirical loss, stateof-the-art procedures often require proper regularization (e.g. trimming, regularized cost, projection) in order to guarantee fast convergence. For vanilla procedures such as gradient descen...

متن کامل

Fast Learning with Nonconvex L1-2 Regularization

Convex regularizers are often used for sparse learning. They are easy to optimize, but can lead to inferior prediction performance. The difference of `1 and `2 (`1-2) regularizer has been recently proposed as a nonconvex regularizer. It yields better recovery than both `0 and `1 regularizers on compressed sensing. However, how to efficiently optimize its learning problem is still challenging. T...

متن کامل

Regularized factor models

This dissertation explores regularized factor models as a simple unification of machine learning problems, with a focus on algorithmic development within this known formalism. The main contributions are (1) the development of generic, efficient algorithms for a subclass of regularized factorizations and (2) new unifications that facilitate application of these algorithms to problems previously ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016